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Abstract 

In this work, we first revise some extensions of the standard Hopfield 
model in the low storage limit, namely the correlated attractor case and 
the multitasking case recently introduced by the authors. The former case 
is based on a modification of the Hebbian prescription, which induces a 
coupling between consecutive patterns and this effect is tuned by a pa- 
rameter a. In the latter case, dilution is introduced in pattern entries, in 
such a way that a fraction d of them is blank. Then, we merge these two 
extensions to obtain a system able to retrieve several patterns in parallel 
and the quality of retrieval, encoded by the set of Mattis magnetizations 
{m' 1 }, is reminiscent of the correlation among patterns. By tuning the 
parameters d and a, qualitatively different outputs emerge, ranging from 
highly hierarchical, to symmetric. The investigations are accomplished by 
means of both numerical simulations and statistical mechanics analysis, 
properly adapting a novel technique originally developed for spin glasses, 
i.e. the Hamilton- Jacobi interpolation, with excellent agreement. Finally, 
we show the thermodynamical equivalence of this associative network with 
a (restricted) Boltzmann machine and study its stochastic dynamics to ob- 
tain even a dynamical picture, perfectly consistent with the static scenario 
earlier discussed. 



1 Introduction 

In the past century, the seminal works by Minsky and Papert [1] , Turing [2] and 
von Neumann [3] set the basis of modern artificial intelligence and, remarkably, 
established a link between robotics and information theory 4j. Another fun- 
damental contribution in this sense was achieved by Hopfield jS], who, beyond 
offering a simple mathematical prescription for the Hebbian rule for learning [5] , 
also pointed out that artificial neural networks can be embedded in a statistical 
mechanics framework. The latter was rigorously settled by Amit, Gutfreund 
and Sompolinsky (AGS) [5], ultimately reinforcing the bridge between cyber- 
netics and information theory [7], given the deep connection between the latter 
and statistical mechanics [U H] . 

As a second-order result, artificial intelligence, whose development had been 
mainly due to mathematicians and engineers, became accessible to theoretical 
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physicists too: in particular, when Hopficld published his celebrated paper, the 
statistical mechanics of disordered systems (mainly spin glasses [§]) had just 
reached its maturity and served as a theoretical laboratory where AGS, as well 
as many others, gave rise to the mathematical backbone of these associative 
networks. 

In a nutshell, the standard Hopfield model can be described by a two-bodies 
mean-field Hamiltonian (a Liapounov cost function [5]), which somehow inter- 
polates between the one describing ferromagnetism, already introduced by Curie 
and Weiss (CW) [10], and the one describing spin-glasses developed by Sherring- 
ton and Kirkpatrick (SK) [5]. Its dichotomic variables (initially termed "spins" 
in the original CW or SK theories) are here promoted to perform as binary neu- 
rons (some "on/off" exasperations of more standard integrate-and-fire models 
|llj ) and the interaction matrix (called synaptic matrix in this context) assumes 
a (symmetrized) Hebbian fashion where information, represented as a set of pat- 
terns (namely vectors of ±1 random entries), is stored. One of the main goals 
achieved by the statistical mechanics analysis of this model is a clear picture 
where memory is no longer thought of as statically stored into a confined region 
(somehow similar to hard disks) , but it is spread over the non-linear retroactive 
synaptic loops merging neurons themselves. Furthermore, it has been offering 
a methodology where puzzling questions, such as the memory capacity of the 
network or its stability under the presence of noise, could finally be consistently 
formulated. 

The success of the statistical-mechanics analysis of neural networks is con- 
firmed by the fact that several variations on theme followed and many scientific 
journals dedicated to this very subject arose. For instance, Amit, Cuglian- 
dolo, Griniatsly and Tsodsky [12j [13j [14] considered a simple modification of 
the Hebbian prescription, able to capture the spatial correlation between at- 
tractors observed experimentally as a consequence of a proper learning. More 
precisely, a scalar correlation parameter a is introduced and when its value over- 
comes a threshold (whose value contains valuable physics as we will explain), 
the retrieval of a given pattern induces the simultaneous retrieval of its most- 
correlated counterparts, in some hierarchical way, hence bypassing the standard 
single retrieval of the original framework (the so called "pure state"). 

In another extension, proposed by some of the authors of the present paper 
[T51 ITS] , the hypothesis of strictly non-zero pattern entries is relaxed in such a 
way that a fraction d of entries is blank. This is shown to imply retrieval of a 
given pattern without exhausting all the neurons and, following thermodynamic 
prescriptions (free energy minimizations), the remaining free neurons arrange 
cooperatively to retrieve further patterns, again in a hierarchical fashion. As 
a result, the network is able to perform a parallel retrieval of uncorrelated 
patterns. 

Here we consider an Hopfield network exhibiting both correlated patterns 
and diluted pattern entries, and we study its equilibrium properties through 
statistical mechanics and Monte Carlo simulations, focusing on the low-storage 
regime. The analytical investigation is accomplished through a novel mathe- 
matical methodology, i.e, the Hamilton-Jacobi technique (early developed in 
[171 [10] ITS]), which is also carefully explained. The emerging behavior of the 
system is found to depend qualitatively on a and on d, and we can distinguish 
different kinds of fixed points, corresponding to the so called pure-state or to 
hierarchical states referred to as "correlated" , "parallel" or "dense" . In partic- 
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ular, hierarchy among patterns is stronger for small degree of dilution, while at 
large d the hierarchy is smoother. 

Moreover, we consider the equivalence between the Hopfield model and a 
class of Boltzmann machines [19] developed in [20l EI] and we show that this 
equivalence is rather robust and can be established also for the correlated and 
diluted Hopfield studied here. Interestingly, this approach allows the investiga- 
tion of dynamic properties of the model which are as well discussed. 

The paper is organized as follows. In section 2, starting from the low-storage 
Hopfield model, we revise, quickly and pedagogically, the three extensions (and 
relative phase diagrams) of interest, namely the high storage case (tuned by a 
scalar parameter a) , the correlated case (tuned by a scalar parameter a) and the 
parallel case (tuned by a scalar parameter d). In Sec. 3, we move to the general 
scenario and we present our main results both theoretically and numerically. 
Then, in Sec. 4, we analyze the system from the perspective of Boltzmann 
machines. Finally, Sec. 5 is devoted to a summary and a discussion of results. 
The technical details of our investigations are all collected in the appendices. 



2 Modelization 

Here, we briefly describe the main features of the conventional Hopfield model 
(for extensive treatment see, e.g., [tJl |2"2"]V 

Let us consider a network of TV neurons. Each neuron Ui can take two states, 
namely, <Ji = +1 (fire) and Oi = — 1 (quiescent). Neuronal states are given by 
the set of variables a = {a\, cr^). Each neuron is located on a complete graph 
and the synaptic connection between two arbitrary neurons, say, Oi and cr,-, is 
defined by the following Hebb rule: 



where £ M = (£i, ...,£at) denotes the set of memorized patterns, each specified by 
a label fx = I,..., P. The entries are usually dichotomic, i.e., £f € {+1,-1}, 
chosen randomly and independently with equal probability, namely, for any i 
and jtt, 

^(ef) = ^(%-i + %+i), (2) 

where the Kronecker 8 X equals 1 iff x = 0, otherwise it is zero. Patterns are 
usually assumed as quenched, that is, the performance of the network is analyzed 
keeping the synaptic values fixed. 

The Hamiltonian describing this system is 

N N N,N P 

H(a, *) = -££ Jm =-^EE - ( 3 ) 

i — l i>j — l i ,j — 1 /a— 1 

so that the field insisting on spin i is 

N 

i=i 
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The evolution of the system is ruled by a stochastic dynamics, according to 
which the probability that the activity of a neuron i assumes the value <7i is 

P{*i;tr,t,0) = ^[l+taah(phi<ri)], (5) 

where j3 tunes the level of noise such that for /3 — > the system behaves com- 
pletely randomly, while for (3 — ¥ oo it becomes noiseless and deterministic; note 
that the noiseless limit of Eq. (|5| is (Ti(t + 1) = sign [(K(t)]. 

The main feature of the model described by Eqs. ^ and |5]) is its ability 
to work as an associative memory. More precisely, the patterns are said to 
be memorized if each of the network configurations = £f for i = 1, ...,7V, 
for everyone of the P patterns labelled by /j,, is a fixed point of the dynamics. 
Introducing the overlap between the state of neurons a and one of the 
patterns as 

1 1 N 

i 

a pattern /i is said to be retrieved if, in the thermodynamic limit, m M = 0(1). 
Given the definition (Jsj) , the Hamiltonian ^ can also be written as 

p 

H(e,e,) = -N^{m^f + P=-Nm 2 + P, (7) 

and, similarly, 

p P 

The analytical investigation of the system is usually accomplished in the 
thermodynamic limit N — > oo, consistently with the fact that real networks 
are comprised of a very large number of neurons. Dealing with this limit, it 
is convenient to specify the relative number of stored patterns, namely P/N 
and to define the ratio a — limjy-yoo P/N. The case a = 0, corresponding to 
a number P of stored patterns scaling sub-linearly with respect to the amount 
of performing neurons TV, is often referred to as "low storage" . Conversely, the 
case of finite a is often referred to as "high storage" . 

The overall behavior of the system is ruled by the parameters T = 1/(3 (fast 
noise) and a (slow noise) and it can be summarized by means of the phase 
diagram shown in Fig. [l] Notice that for a = 0, the so-called pure-state ansatz 

m= (1,0,. ..,0), (9) 

always corresponds to a stable solution for T < 1; the order in the entries 
is purely conventional and here we assume that the first pattern is the one 
stimulated. 



3 Generalizations 

The Hebbian coupling in Eq. [T] can be generalized in order to include possible 
more complex combinations among patterns; for instance, we can write 
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Figure 1: At a high level of noise the system is ergodic (PM) and no retrieval 
can be accomplished (m M = 0, V/i). By decreasing the noise level below a critical 
temperature (dashed line) one enters a "spin- glass" phase (SG), where there is 
no retrieval (m M = 0), yet the system is no longer full-ergodic. Now, if the 
number of patterns is small enough (a < 0.138), by further decreasing the level 
of noise, one eventually crosses a line (solid curve), below which the system 
develops 2P meta-stable retrieval states, each can be separately retrieved with 
a macroscopic overlap (m^ 7^ 0). Finally, when a is small enough (a < 0.05), a 
further transition occurs at a critical temperature (dotted line) , such that below 
this line the retrieval states become global minima (R). 




where X is a symmetric matrix; of course, by taking X equal to the identity 
matrix we recover Eq. [T] A particular example of generalized Hebbian kernel 
was introduced in [12] , and further investigated in jT3j [14] , as 



/ 1 a 
a 1 



X = 





a 







(11) 



\ a ■■• a 1 J 

In this way the coupling between two arbitrary neurons turns out to be 



-er 1 ^)]- 



(12) 



Hence, each memorized pattern, meant as a cyclic sequence, couples the con- 
secutive patterns with a strength a, in addition to the usual auto-associative 
term. 

This modification of the Hopfield model was proposed in [T^] to capture some 
basic experimental features about coding in the temporal cortex of the monkey 
[231 124] : a temporal correlation among visual stimuli can evoke a neuronal 



activity displaying spatial correlation. Indeed, the synaptic matrix (12 1 is able 



to reproduce this experimental feature in both low [THEI] an( l high [15 storage 
regimes. 

For the former case, one derives the mean-field equations determining the 
attractors, which, since the matrix is symmetric, are simple fixed points. In the 
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Figure 2: Phase diagram for the correlated model with low storage (P = 13), as 
originally reported in [T3] . At a high level of noise the system is ergodic (PM) 
and it eventually reaches a state with m* 1 = 0, V/i. At smaller temperatures 
(below the dashed line), the system evolves to a so-called symmetric state (S), 
characterized by, approximately, m M = m =/= 0, V/i. Then, if a is small enough, 
by further reducing the temperature (below the solid line), the network behaves 
as a Hopfield network and the pure state retrieval (PS) can be recovered. On the 
other hand, if a is larger, as the temperature is reduced, correlated attractors 



(C) appear according to Eq. 14 Then, if the temperature is further lowered, 
the system recovers the Hopfield-like regime. If a > 1/2, the pure state regime 
is no longer achievable. 



limit of a large network, they read off as [12] 

m" = (t" tanh m"[tf + a(tf +1 + J , (13) 

where (-)j means an average over the quenched distribution of patterns. 

In |12j . the previous equation was solved by starting from a pure pattern 
state and iterating until convergence. In the noiseless case, where the hyperbolic 
tangent can be replaced by the sign function, the pure state ansatz is still a fixed 
point of the dynamics if a € [0, 1/2), while if a e (1/2, 1], the system evolves to 
an attractor characterized by the Mattis magnetizations (assuming P > 10, see 
Appendix A) 

m= -^(77,51,13,3, 1,0, ...,0,. ..,0,1, 3, 13,51), (14) 

namely, the overlap with the pattern used as stimulus is the largest and the 
overlap with the neighboring patterns in the stored sequence decays symmetri- 
cally until vanishing at a distance of 5. Some insights into these results can be 
found in Appendix A. 

In the presence of noise, one can distinguish four different regimes according 
to the value of the parameters a and T. The overall behavior of the system is 
summarized in the plot of Fig. [2] A similar phase diagram, as a function of a 
and a, was drawn in |14j for the high-storage regime. 



A further generalization can be implemented in order to account for the fact 
that the pattern distribution may not be uniform or that pattern may possibly 
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Figure 3: At high levels of noise the system is ergodic (PM) and below the 
temperature T — 1 — d (continous line) it can develop a pure state retrieval (PS) 
or a symmetric retrieval (S), according to whether the dilution is small or large, 
respectively. At small temperatures and intermediate degree of dilution the 
system can develop a parallel (P) retrieval, according to Eq. 16 The continuous 
line works for any value of P, while the dotted and dashed lines were obtained 
numerically for the case P = 3. 



be blank. For instance, in the latter case one may replace Eq. [2] by 

p(0 = + +i + ( 15 ) 

where d encodes the degree of dilution of pattern entries. This kind of extension 
has strong biological motivations, too. In fact, the distribution in Eq. [^neces- 
sarily implies that the retrieval of a unique pattern does employ all the available 
neurons, so that no resources are left for further tasks. Conversely, with Eq. [15] 
the retrieval of one pattern still allows available neurons which can be used to 
recall other patterns. The resulting network is therefore able to process several 
patterns simultaneously. The behavior of this system is deeply investigated in 
[151 116) . as far as the low storage regime is concerned. 

In particular, it was shown both analytically (via density of states analysis) 
and numerically (via Monte Carlo simulations), that the system evolves to an 
equilibrium state where several patterns are contemporary retrieved; in the 
noiseless limit T = 0, the equilibrium state is characterized by a hierarchical 
overlap 

m = (1 - d)(l,d,d 2 ,...,0), (16) 

hereafter referred to as "parallel ansatz", while, in the presence of noise, one 
can distinguish different phases as shown by the diagram in Fig. [3] 



To summarize, both generalizations discussed above, i.e. Eqs. [12] and 15 
induce the break-down of the pure-state ansatz and allow the retrieval of mul- 
tiple patterns without falling in spurious states Q In the following, we merge 
such generalizations and consider a system exhibiting both correlation among 
patterns and dilution in pattern entries. 



1 Since here we focus on the case a = 0, spurious states are anyhow expected not to emerge 
since they just appear when pushing the system toward the spin-glass boundary on a > 0. 
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Figure 4: Schematic representation of the general model considered in the 
low-stare regime (a = 0) and zero noise (T = 0) . According to the value of the 
parameters a (degree of correlation) and d (degree of dilution) the system can 
recover different kinds of systems. The red curve corresponds to Eq. 20 



4 General Case 

Considering a low-storage regime with constant P, the general case with a € 
[0, 1] and d € [0, 1] can be visualized as a square (see Fig. [4]), where vertices and 
nodes correspond to either already-known or trivial cases, while the bulk will 
be discussed in the following. 

First, we notice that the coupling distribution is still normal with average 
(J) 4 = and variance (J 2 ) ? = (1 + 2a 2 )(l - d) 2 /(2P). The last result can be 
realized easily by considering a random walk of length P: The walker is endowed 
with a waiting probability d and at each unit time it performs three steps, one 
of length 1 and two of length a. 

Moreover, as shown in Appendix C, the self-consistance equations found in 
[T2l [T5] can be properly extended to the case d ^ as 

m = <£tanh09C-JSfm)) £) (17) 



where X is the matrix inducing the correlation (see Eq. 
now mean an average over the possible realizations of dilution too 



11 1 and the brakets (.)^ 



4.1 Free-noise system: T = 



The numerical solution of the self-consistence equation (17) are shown in Figs. [5] 
and [6j as functions of d and a; several choices of P are also compared. Let us 
focus on the case P = 5 (see Fig. [5]) for a detailed description of the system 
performance. 



When a < 1/2, the parallel ansatz (16) works up to a critical dilution di(a), 
above which the gap between magnetizations, i.e. |m M — m v \, drops abruptly 
and, for d > di(a), all magnetizations are close and decrease monotonically 
to zero. To see this, let us reshuffle the ansatz in (16), so to account for the 
hierarchy induced by correlation, that is, 



(1 -d)(l,d,d 3 ,d 4 ,d 2 ), 



(18) 
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Figure 5: Magnetization m versus degree of dilution for fixed P = 5 and 
T = 0.0001; magnetizations related to different patterns are shown in different 
colors. Several values of a are considered, as specified in each panel. 




Figure 6: Magnetization m versus degree of dilution for fixed a — 0.3 and 
T = 0.0001. Several values of P are considered for comparison: P = 5 (leftmost 
panel), P — 7 (central panel) and P = 9 (rightmost panel). Magnetizations 
related to different patterns are shown in different colors. 
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which can be straightforwardly extended to any arbitrary P. Given the state 
(18), the field insisting on Uj is 



p 

hi = £[£f + a(er 1 + ef +1 )H A1 (19) 

= (1 - t 1 + adi 1 + d )} + til* 1 + o(l + d 3 )} + g[d? + ad(l + d 3 )} 
+ g[d 4 + ad 2 (l + d)] + g[d 2 + o(l + d 4 )}}. 

A signal-to-noise analysis suggests that this state is stable only for small degrees 
of dilution. In fact, there exist configurations (e.g., ^ and = — , for 
any [i > 1) possibly giving rise to a misalignment between Ui and with 
consequent reduction of to 1 . This can occur only for d > d\ (a) , being d\ (a) the 
root of the equation a = (1 — d — d 2 — d 3 — c? 4 )/[2(l + d 3 + d 4 )], as confirmed 
numerically (see Fig. [5]). In general, for arbitrary P, one has 

a={l-2d + d p )/[2(l-d + d 3 ~d p )}, (20) 

which is plotted in Fig. [4] 

As d > di(a), the magnetic configuration corresponding to Eq. (18) under- 
goes an updating where a fraction of the spins aligned with £ flips to agree 
mostly with £ 2 and £ 5 , and partly also with £ 3 and £ 4 ; as a result, m 1 is re- 
duced, while the other magnetizations are increased. Analogously, a fraction 
of the spins aligned with £ 2 is unstable and flips so to align mostly with £ 5 ; 
consequently, there is a second-order correction which is upwards for m 5 (and 
to less extent for to 1 , to 3 and m ) and downwards for to 2 . Similar arguments 
apply for higher-order corrections. 

At large values of dilution it convenient to start from a different ansatz, 
namely from the symmetric state 

m = m(l, 1,1,1,1). (21) 

This is expected to work properly when dilution is so large that the signal on any 
arbitrary spin Cj stems from only one pattern, i.e., £f 7^ and = 0,W 7^ fi. 
This approximately corresponds to d > 1 — 1/P. The related magnetization is 
therefore to = d 4 (l — d). Now, reducing d, we can imagine, for simplicity, that 
each spin at feels a signal from just two different patterns, say and £ 2 . The 
prevailing pattern, say £ x , will increase the related magnetization and vice versa. 
This induces the breakdown of the symmetric condition so that to 1 grows larger, 
followed by to 2 , to 5 , and so on. The gap between magnetizations corresponds to 
the amount of spins which have broken the local symmetry, that is c? 3 (l — d) 2 . 
Thus, magnetizations differs by the same amount and this configuration is stable 
for large enough dilutions. By further thickening non-null entries, each spin 
has to manage a more complex signal and higher order corrections arise. For 
instance, one finds to 1 = d 4 (l — d) + 4d 3 (l — d) 2 + 2d 2 (l — d) 3 , and similarly 
for to m>1 . This picture is consistent with numerical data and, for large enough 
values of d, it is independent of a (see Fig. [5]) . Notice that in this regime of high 
dilution hierarchy effects are smoothed down, that is, magnetizations are close 
and we refer to this kind of state as "dense" . 



When a > 1/2, the parallel ansatz in Eq. 18 is no longer successful at small 



d, in fact, correlation effects prevail and one should rather consider a perturbed 
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version of the correlated ansatz (14), that is, 



m= (l-d)-(5,3,l,l,3). (22) 



We use (22) as initial state for our numerical calculations finding, as fixed point, 
to 1 = (1 - d)5/8, m 2 = (1 - d 2 )3/8, m 3 = to 4 = (1 + d)l/8, m 5 = (1 - d + 
d 2 )3/8. This state works up to a critical dilution ^2(1), where, again there is 
the establishment of a situation with magnetizations close and monotonically 
decreasing to zero. This scenario is analogous to the one describe above and, 
basically, d%{a) marks the onset of the region where dilution effects prevails. 
The threshold value d 2 is slowly decreasing with a. 

4.2 Noisy system: T > 

The noisy case gives rise to a very rich phenomenology, as evidenced by the 
plots shown in Fig [8} 

In the range of temperatures considered, i.e. T < 0.1, we found that, when 



d < di(a,T) and a < ai(T), the parallel ansatz (18) works; in general, d\(a,T) 
decreases with T and with a, consistently with what found in the noiseless case 
(see Fig. p}. Moreover, ai(T) also decreases with T, consistently with the case 
d = [T4] 7 (see Fig. [2]): from a\ onwards correlation effects get non-negligible. 
For larger values of a, namely a\(T) < a < a 2 (T), the perturbed correlated 



ansatz (22) works, while for a > 02 (T) correlations effects are so important that 
a symmetric state emerges. Again, we underline the consistentcy with the case 
d = |14] : the region a\ (T) < a < a 2 (T) corresponds to an intermediate degree 
of correlation which yields a hierarchical state, while a > aziT) corresponds to 
a high degree of correlation which induces a symmetric state (see Fig. [2]) . 

As for the region of high dilution, we notice that when d is close to 1 the 
paramagnetic state m = (0,0,0, ..,0) emerges. In fact, as long as the signal 
(1 — d) + 2a(l — d) is smaller than noise T, no retrieval can be accomplished, 
therefore, the condition 

d < 1 - T/(l + 2a) (23) 

must be fulfilled for m M > to hold. The system then relaxes to a symmet- 
ric state which lasts up to intermediate dilution, where a state with "dense " 
magnetizations, analogous to the one described in Sec. 4.1, emerges. 

4.3 Monte Carlo simulations 

The model was analyzed also via Monte Carlo simulations, which were imple- 
mented to determine the equilibrium values of the order parameter associated 
to the following Hopfield-like Hamiltonian 

h = -J2 wiJ* = ^ E w Efc- # + + er 1 ^)]- (24) 

i<j ij n 



where the coupling encodes correlation among patterns according to Eq. (12), 



and pattern entries are extracted according to Eq. (15). 

The dynamical microscopic variables evolve under the stochastic Glauber 
dynamic |25j 

<ri(t + St) = sign[tanh[/3/ii(o-(t))] + %(*)], (25) 
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where the fields hi = Y2j JijVjit) represent the post-synaptic potentials of the 
neurons. The independent random numbers i]i(t), distributed uniformly in [0, 1], 
provides the dynamics with a source of stochasticity. The parameter /3 = 1/T 
controls the influence of the noise on the microscopic variables <7,-. In the limit 
T — > 0, namely /? — > oo the process becomes deterministic and the system 
evolves according to <Ji{t + St) = sign[/ij]. 

In general, simulations were carried out using lattices consisting of 10 4 "neu- 
rons" and averaging on statistical samples composed of 10 2 realizations. For 
each realization of the pattern set {^ A '}u=i,... p, the equilibrium values of Mat- 
tis magnetizations were determined as a function of d and the degree of dilution 
in pattern entries is incremented in steps of Ac? = 0.01, by sequentially set equal 



to zeros the entries of the P vectors, in agreement with the distribution (15). 

Overall, there is a very good agreement between results from MC simula- 
tions, from numerical solution of self-consistent equations and from analytical 
investigations (see Fig. [8]). 

5 Extended Boltzmann machine 

It is possible to get a deeper insight into the behavior of the system from the 
perspective of Boltzmann Machines (BMs), exploiting the approach first intro- 
duced in [5T]. In particular, it was shown that a "hybrid" BM characterized by 
a bipartite topology (where the two parties are made up by TV visible units Oi 
and by P hidden units z M , respectively), after a marginalization over the (ana- 
log) hidden units, turns out to be (thermodynamically) equivalent to a Hopfield 
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Figure 8: Magnetization m versus degree of dilution for fixed P = 5, T = 0.0001 
and a = 0.3 (left panel) or a — 0.7 (right panel). Results from numerical solution 
of Eq. 17 (dashed, thick line) and Monte Carlo simulations (solid, thin lines) 
with associated error (shadows) are compared showing, overall, a very good 
agreement. 



network. In this equivalence the N visible units play the role of neurons and the 
link connecting <Xj to is associated to a weight £f. The term "hybrid" refers 
to the choice of the variables associated to units: the visible units are binary 
(cr, G {— 1, +1}), as in a Restricted Boltzmann Machine, while the hidden ones 
are analog (z^ G R), as in a Restricted Diffusion Network. 

As we are going to show, this picture can be extended to include also the 
correlation among attractors and the dilution in pattern entries. More precisely, 
we introduce an additional layer made up by P "boxes" , which switches the 
signal on the two hidden variables z M and z M +i (see Fig. |9]). 

Such boxes do not correspond to any dynamical variable, but they retain a 
structural function as they properly organize the interactions between the two 
"active" layers: The binary layer is linked to boxes by a synaptic matrix £, the 




Figure 9: Schematic representation of a hybrid BM, with N = 5 visible nodes 
(O) and P = 3 hidden nodes (A). The number of boxes (□) is P as well. The 
average number of links stemming from visible units is 2, due to dilution. The 
link between the i-th visible unit and the /x-th box is £f ; the link between the 
/i-th box and the ju-th [(fi + l)-th] hidden unit is c [b]. 
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boxes are in turn connected to the analog layer by a "connection matrix" that 
wc call X. The synaptic matrix £ is P x N dimensional, each row £ M being a 
stored pattern. A link between the discrete neuron Uj and the /i-th box is drawn 
with weight £f , which take value in the alphabet {—1, 0, 1} following a proper 
probability distribution. A null weight corresponds to a lack of link, that is, 
we are introducing a random dilution in the left of the structure. On the other 
hand, the matrix X is P x P dimensional and meant to recover the correlation 
among the stored patterns. Here, we choose £ according to Eq. 
to recover [321 Q21 [14] , namely 
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and X such 



bd. 



' j-L,V— 1 ) 



(26) 



where c and b are parameters tuning the strength of correlation between con- 
secutive patterns entries (vide infra). More complex and intriguing choices of 
X could be implemented, possibly related to a major adherence to biology. 

The dynamics of the hidden and visible layers are quite different. As ex- 
plained in [31], the activity in the analog layer follows a Ornstein-Uhlembeck 
(OU) diffusion process as 



(27) 



rif, = -Zp + fitpn + V2rC M (t), 



where — represents a leakage term, tp^ denotes the input due to the state of 
the visible layer, £^ is a white Gaussian noise with zero mean and covariance 
((n(t)£v(1?)) = Sfj, tV S(t — t'), t is the typical timescale and j3 tunes the strength 
of the input fluctuations. In vector notation the field on the analog layer is 
(f = X ■ £ • a/yN, or, more explicitly, 



N 



N 



4 e E = -4 e«- + «r>< 

2 — 1 V—\ 2=1 

The activity in the digital layer follows a Glauber dynamics as 

1 



T / (m AI ) 5 = -<m' 1 ) s 



^ tanh 



N 



(28) 



(29) 



where the interaction with the hidden layer is encoded by (j) — X ■ £ • z, that is, 
p p N 



= E E = £(<# + 



(30) 



fl—1 U — l 



M =l 



The timescale of the analog dynamics (27) is assumed to be much faster than 



that of the digital one (29), that is r' 3> r 



Since all the interactions are symmetric, it is possible to describe this system 



through a Hamiltonian formulation: from the OU process of Eq. (27) we can 
write 

= -d Z)1 H(z,a,€,X), 

being 



H(z,a,tX) = z 2 /2-/3j2 



(31) 
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The partition function Zpf(/3;£,X) for such a system then reads off as 

Z N (fc£,X) = E / II dn{z ll )e- H <"*> x \ (32) 

cr •* /j,= l 

where d^,{z^) is the Gaussian weight obtained integrating the leakage term in 
the OU equation. 

Now, by performing the Gaussian integration, we get 

Z NtP m,*) = {2^ P/2 Y.^^ U< = W P/2 J2e-% H ^>*\ (33) 

a a 

where 

H(a, t X) = -<p 2 = - ^<J T ■ S T X T X(i ■ <?, (34) 

which corresponds to an Hopficld model with patterns £ = X under the shift 
1 — > j3. We then call X = X X the correlation matrix which is obviously 
symmetric, so that the interactions between the us are symmetric, leading to 
an equilibrium scenario. Using Eq. |26| the matrix X is 

X^u = (c 2 + b 2 )5^ + cb(S^ +1 + (35) 



and we can fix b + c — 1 and be = a, to recover the coupling in Eq. 12 It is 
easy to see that, as long as b, c € K, a < 1/2. In general, with some algebra, we 
get 

c = ±i(vT+~2a± VI - 2a), (36) 
1 



6 = ±-(\/l + 2a=FVl-2a), (37) 



therefore, the product X •£ appearing in both fields ip^ (see Eq. 28) and (j)^ (see 
Eq. 30), turns out to be 

(X ■ 0„,i = ±* [vT+2^f + £f +1 ) ± VT=2S(tf - (38) 

Thus, when a < 1/2, (X • f)^ G M,V/i,z, while for a > 1/2, (X ■ can 
be either real or pure imaginary, according to whether the /Lt-th entry and the 
following fi + 1-th are aligned or not. 

Having described the behavior of the fields, we can now deepen our inves- 
tigation on the dynamics of the Boltzman machine underlying our generalized 
Hopfield model. 

Let us write down explicitly the two coupled stochastic Langevin equations 
(namely one OU process for the hidden layer, and one Glauber process for the 
Hopfield neurons) as 

Ti» = -z, + ^Y,^ + b ^ +1 )^ (39) 

i 

P 

X 



« + (^tanh[/3^^z„(cr + 6r +1 )]\ • (40) 
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Note that by assuming thermalization of the fastest variables with respect to 
the dynamical evolution of the magnetizations, namely requiring — 0, we can 
use Eq. [39] to explicit the term z v in the argument of the hyperbolic tangent in 
Eq. [40j hence recovering the self-consistencies of Eq. (17), (see also Appendix 
C). 

Assuming that the two time scales belong to two distinct time sectors, it is 
possible to proceed in the opposite way, that is 



V) = =>• (m^) = 



1 p 

^tanh[/3— ^ 2 „( c r+&r +1 )] 



(41) 



For the sake of simplicity let us deal with the P — 2 case, being the generaliza- 
tion to the case P > 2 straightforward. Linearization implies 



Tii = -zi + (« 1 + 6e 2 )/3 2 [ziK 1 + 6^) + z 2 (c^ 2 + ^ 1 ]) r 
rz 2 = -z 2 + ((c4 2 + be)(3 2 lzM 1 +be)+z 2 (ct 2 + be})f 

which, recalling that c 2 + b 2 = 1 and cb = a, turn out to be 

tz\ = Zl [-l + {l-d)p 2 ]+z 2 [2a(l-d)f3 2 ], 
tz 2 = z 2 [-l + (l-d)p 2 ]+z 1 [2a(l-d)f3 2 ]. 

It is convenient to rotate the plane variables z\, z 2 and define 



x 



zi + z 2 
zi - z 2 , 



such that Eqs. 42 and 43 can be restated as 



y 



{l-d)(i 2 (c + bf 

T 

(1 -d)p 2 {c-b) 2 



which, in terms of the parameter a, are 

'1 (l-rf)/3 2 (l + 2a) 



(1 - d)/3 2 (l - 2a) 



whose solution is 



x(t) = x(0)e Y ^ = x(0) exp [=*(1 - (1 - d)/3 2 {l + 2a)) 



y{t) = y{p)e Y »* = y(0)ex P [^(1 - (1 - d)/3 2 (l - 2a)) 
The Lyapunov exponents of the dynamical system Y x , Y y turn out to be 

1 



y, 



Y„ 



- fl- I3 2 (l-d)(l + 2a)] 
--[l-/3 2 (l-d)(l-2a)} 



(42) 
(43) 



(44) 
(45) 



(46) 
(47) 



(48) 
(49) 

(50) 
(51) 

(52) 

(53) 
(54) 
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This dynamic scenario can be summarized as follows: If the noise level is high 
(8 <C 1), the dynamics is basically quenched on its fixed points x = 0,y = 
and the corresponding Hopficld model is in the ergodic phase. If the noise 
level is reduced below the critical threshold, then two behavior may appear: If 
a < 1/2 both x and y increase, which means that only one z variable is moving 
away from its trivial equilibrium state (this corresponds to a retrieval of a single 
pattern in the generalized Hopfield counterpart); if a > 1/2, x increases while 
y points to zero, which means that both the variables Zi,Z2 are moving away 
from their trivial equilibrium values (this corresponds to a correlated retrieval 
in the generalized Hopfield counterpart). 
Switching to the original variables we get 

r-t, , ,^ , Ml- d)82a^ ,. ir t(l-d)B2a 
zi{t) = exp — (1- (l-d)/3) (zi(O)cosh - '—- } + z 2 (0) sinh - 

T T T 

,-t, , s ^ n/ , s ,t(l-d)B2a, ,t(l-d)B2a 
z 2 (t) = exp[— (1- (l-d)/3) (zi(O)sinhp ^— + z 2 (0) coshp ^— 

T T T 

Again, Lyapunov exponents describe a dynamics in agreement with the statis- 
tical mechanics findings. 



6 Discussion 

While technology becomes more and more automatized, our need for a systemic 
description of cybernetics, able to go over the pure mechanicistic approach, 
gets more urgent. Among the several ways tried in this sense, neural networks, 
with their feedback loops among neurons, the multitude of their stable states 
and their stability under attacks (being the latter noise, dilution or various 
perturbations), seem definitely promising and worth being further investigated. 

Along this line, in this work we considered a complex perturbation of the 
paradigmatic Hopfield model, by assuming correlation among patterns of stored 
information and dilution in pattern entries. First, we reviewed and deepened 
both the limiting cases, corresponding to a Hopfield model with correlated at- 
tractors (introduced and developed by Amit, Cugliandolo, Griniatsly and Tsod- 
sky [TJ1 [TJl [TJ]) and to a Hopfield model with diluted patterns (introduced by 
some of us dUEI]). The general case, displaying a correlation parameter a > 
and a degree of dilution d > 0, has been analyzed from different perspectives 
obtaining a consistent and broad description. In particular, we showed that the 
system exhibits a very rich behavior depending qualitatively on a, on d and on 
the noise T: in the phase space there are regions where the pure-state ansatz 
is recovered, others where several patterns can be retrieved simultaneously and 
such parallel retrieval can he highly hierarchical or rather homogeneous or even 
symmetric. 

Further, recalling that interactions among spins are symmetric and therefore 
a Hamiltonian description is always achievable, we can look at the system as the 
result of marginalization of a suitable (restricted) Boltzman Machine made of by 
two layers (a visible, digital layer built of by the Hopfield neurons and a hidden, 
analog layer made of by continuous variables) interconnected by a passive layer 
of bridges allowing for pattern correlations. In this way the dynamics of the 
system can as well be addressed. 
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Appendices 



Appendix A. - In this Appendix we provide some insights into the shape of 
the attractors emerging for the correlated model in the noiseless case. We recall 
for consistency the coupling 

where the pattern matrix £ is quenched. Due to the definition above, mag- 
netizations are expected to reach a hierarchical structure, where the largest 
one, say m , corresponds to the stimulus and the remaining are symmetrically 
decreasing, say 

to 1 > m 2 - m p > ... > m( p+1 )/ 2 = m( p+1 >/ 2+1 , (56) 

where we assumed P as odd. The distance between the pattern ji and the 
stimulated pattern is k(fx, P) = min[/i — 1, P — (jjl — 1)]. 

Moreover, each pattern \x determines a field h^, which tends to align the i-th 
spin with £[\ The field reads off as 

/i^m^ + aK^+m^ 1 ). (57) 

At zero fast noise we have that 



o-j = sign = sign 



(58) 



Due to Eqs. (56) and (57), the first pattern is likely to be associated to a large 



field and therefore to determine the sign of the overall sum appearing in Eq. ( 57 1 . 
On the other hand, patterns with /i close to (P + l)/2 are unlikely to give 
an effective contribution to (pi and therefore to align the corresponding spins. 
Indeed, the field h v ZX may determine the sign of ifi for special arrangements of 
the patterns /i corresponding to smaller distance, i.e. fc(/i, P) < k(y,P). More 
precisely, their configuration must be staggered, i.e., under gauge symmetry, 

= +1, & = ef = -1, $ = Cf _1 = +1, -^r 1 - £ P ^ +3 - By counting such 
configurations one gets m" '. 

With some abuse of language, in the following we will denote with rrik the 
Mattis magnetization corresponding to patterns at a distance k from the first 
one. For simplicity, we also assume P small such that ^ 0, Vfc. 

Then, it is easy to see that, over the 2 P possible pattern configurations, 
those which effectively contribute to rntp-iyz are on ly 4. In fact, it must be 
^ p + 1 )/ 2 = £{ p + 1 )/ 2 + 1 = +1(— 1) and all the remaining must be staggered; 
therefore, rri(p_i)/2 = 4/2 p = 2 2_p . 

As for ^(p-3)/2, contributes come from configurations where the patterns 
corresponding to [i < (P—l) /2 are staggered. Such configurations are 2 4 , but we 
need to exclude those which are actually ruled by the farthest patterns, which are 
4, hence, the overall contribute is 16 — 4 = 12 and W(p-3)/2 = 12/2 P = 3x 2 2 ~ p . 

We can proceed analogously for the following contributes. In general, by 
denoting with the fc-th contribute, one has the following recursive expression 

c fe _! = 2 2k - c k (59) 
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with C(p_i)/2 = 4 and k < (P — l)/2. For the last contribute, one has Ck-i — 
2 2k ~ 1 — Cfe, because the last pattern has no "twin". 
Applying this result we get 

m = -(1,1,1), for P = 3 (60) 

m = -(5,3,1,1,3), forP = 5 (61) 
8 

m = ^(19,13,3,1,1,3,13), forP = 7 (62) 

m = i(77, 51, 13, 3, 1,1, 3, 13, 51), for P = 9, (63) 

consistently with [13j [14] . 

Let us now consider the case P = 11; following the previous machinery 
we get m = (307, 205, 51, 13, 3, 1, 1, 3, 13, 51, 205). However, such state is not 
stable over the whole range of a. In fact, by requiring that the field due to the 
farthest pattern is larger than the field generated by the staggered configuration 
of patterns we get 

2(2a — 1)(— me/2 + m§ — 777,4 + — m-i) < mi (64) 

which implies a < 23/42 0.54. Hence, from that value of a, the previous state 
is replaced by m = (77, 51, 13, 3, 1, 0, 0, 1, 3, 13, 51), which is always stable. 
Similarly, for P = 13, we get a state for m with m, > 0,Vi, which is stable only 
when a < 85/164 « 0.518, for larger values of a this is replaced by the state 
found for P = 11 and then for the state found for P = 9. 

All these results have been quantitatively confirmed numerically. We finally 
notice that for the arguments presented here there is no need for the low storage 
hypothesis. 

Appendix B. -In this appendix we want to show that the model is well 
behaved, namely, that its intensive free energy has a thermodynamic limit that 
exists and is unique: Despite it may look as a redundant check, we stress that 
the thermodynamic limit of the high storage Hopfield model (e.g. the a > 
case) is still lacking, hence rigorous results on its possible variants still deserve 
some interest. 

To obtain the desired result, our approach follows two steps: first we show, via 
annealing, that the intensive free energy is bounded in the system size, then we 
show that it is also super-additive. As a consequence of these two results the 
statement straightly follows [26] . 

Remembering that FN(f},a,d) — A _1 E In Zn(/3, a, d), where 
Zjst 

(T 

is the partition function. Annealing the free energy consists in considering the 
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following bound 

F N (p,a,d) = ^log^e-^-^)^ (65) 

^ ^ lo s<E e ~™ ff;?) > ( 66 ) 

(7 

< llog^e"^^^)), (67) 

(7 

where in the last line we used Jensen inequality. 
As a result we get 

Z N (j3, a, d) = ^2 ^{^ m i^ +a ^ m ^^ m ^ +1 ^ + ^ m ^^ m ^- 1< -' 7 ^} (68) 

< 2 Jv e? (JV - 1) ^ (1+2a)(1 - d) ' (69) 
by which the annealed free energy bound reads off as 

F N (/3, M) < ln2 + ^(3(1 + 2o)(l - d) (l + j^j , (70) 

such that the annealed free energy is Fa(/3, a,d) — In 2 + P/3(l + 2a)(l — d)/2. 
Let us move over toward proving the super-additivity property and consider two 
systems independent of each other and with respect to to the original TV-neurons 
model, and made of respectively by N\ and N 2 neurons, such that N = Ni+N 2 . 
In complete analogy with the original system we can introduce 

i i 

and note that the original Mattis magnetizations are linear combinations of the 
sub-systems counterparts such that 

Nl (i) _i_ ^ (2) 

Since the function ir — > x 2 is convex (and the translation x — > innocent) we 
have 

^jv(/3, a, d) < E e ^ E ^ m ^ 1) ' 2(CT)+ 4 ro - )(CT)m ^i (<T)+ < )(<T)m ^-i (ff) ] } 

E^{™< 2) ' 2 (-)+a[< ) (-)« l < 2 -| 1 (-)+™< l 2) (-)«^ 1 M] } 

= Z Nl (p,a,d)Z N2 (/3,a,d), (71) 

by which the free energy density Fn(/3, a, rf) is shown to be sub-additive as 

NF N {(3,a,d) > 7Vif JVl ( J 8,o,d) + N 2 F N2 ((3,a,d). 

As the free energy density is sub-additive and it is limited (and this is an 
obvious consequence of the annealed bound), the infinite volume limit exists 
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and is unique and equal to its sup over the system size lirn/v->oo F^{j3,a,d) = 
sup N F N ((3,a,d) = F{P,a,d). 

Appendix C. In this Appendix we outline the statistical mechanics calcu- 
lations that brought to the self consistency used in the text (eq. [l3| ) . Our cal- 
culations are based on the Hamilton- Jacobi interpolation technique 17\ [TU] [T%] . 
This appendix aims two different targets. From one side it outlines the physics 
of the model and describes it through the self-consistent equation; from the 
other side it develops a novel mathematical technique able to solve this kind of 
statistical mechanics problems. 

In a nutshell the idea is to think at f3 as a "time- variable" and to introduce P 
ficticious axes x^, meant as " space- variables" , then, within an Hamilton- Jacobi 
framework, the free energy with respect to these Euclidean coordinates, is shown 
to play the role of the Principal Hamilton Function, whose solution can then be 
extrapolated from classical mechanics. 

Our generalization of the Hopficld model is described by the Hamiltonian: 



H N (a, Z) = -±fl E ( 72 ) 
as discussed in the text (see Sects. 2 and 3). 

The iV-neuron partition function Zjv(/3, a, d) and the free energy F(f3, a, d) can 
be written as 

Z N (p,a,d) = ^exp[-/?#;v(a,£)], (73) 

a 

F(p,a,d) = lim -(log^O^M)), (74) 

iv— >oo iv 

where (.) again denotes the full averages over both the distribution of the 
quenched patterns £ and the Boltzman weight (for the sake of clearness, let 
us stress that the factor B(ft,a,d) = exp[— f3Hpf(a, £)] is termed Boltzman fac- 
tor). 

As anticipated, the idea of the Hamilton- Jacobi interpolation is to enlarge 
the "space of the parameters" by introducing a P+l Euclidean structure (where 
P dimensions are of space type and mirrors the P Mattis magnetization, while 
the remaining one is of time type and mirrors the temperature dependence) and 
to find a general solution for the free energy in this space thanks to techniques 
stemmed from classical mechanics. The statistical mechanics free energy will 
then be simply this extended free energy evaluated in a particular point of this 
larger space. Analogously, the average (-)( x .t) extends the ones earlier intro- 
duced by accounting for this generalized Boltzmann factor and will be denoted 
by (.), wherever evaluated in the sense of statistical mechanics. 
The "Euclidean" free energy for N neurons, namely F]y(t, x), can then be writ- 
ten in vectorial terms as 



F N (t t x) = -j= ( In J 5>xp [J.(£ :V.>> (x.i^ ) 



2A' 



(75) 



The matrix X can be diagonalized trough X = DU, where U and W are 
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(unitary) rotation matrices and D is the diagonal expression, such that 
F N (t,x) = ^(ln^ ex P 



= ^( ln E ex P 



27V 
t 

"2N 



as (fa X£a) = (£<r, lflDU&) = (VDU^a, y/DUfr) and (x, £cr) = (x, WU^a) 

we switch to the new variables £ = ^/~DU^ and x = 



(77) 
(78) 



D Uxwe can write the Euclidean free energy in a canonical form as 
F N (t,x) = ^/ln^exp(-^(|<7,£7) + (x,<£<7) 



1 / N N 

= ^( ln E-p(E^E^+E^E^. 



Thus, we write the (x, independent Boltzmann factor as 
B;v(x, t) — exp 



(79) 



remembering that -B/v(x, t) matches the classical statistical mechanics factor for 
t = —0 and = V/i, as even a visual check can immediately confirm. 

Now, let us consider the derivative of the free energy with respect to each 
dimension (i.e., t,x^): 



I p 

d t F N (t, x ) = - T((m M ) 2 ) (i , () , 



d s ^F N (t,5t) = (m Al ) ( x,t)- 



(80) 
(81) 



We notice that the free energy implicitly acts as a Principal Hamilton Action 
if we introduce the potential 



Mt,x) = jE[(w 2 )-w 2 ]. 

In fact, we can write the Hamilton- Jacobi equation for the F N action as 
dtF N (t,i) + ^^(d £ll F N (t,Z)j + V N (t,i) = 0. 



(82) 



(83) 



Interestingly, the potential is the sum of the variances of the order parameters 
and we know from Central Limit Theorem argument that in the thermodynamic 
limit they must vanish, one by one, and, consequently, lmijv-s-oo Vif(t, x) = 
0. Such self-averaging property play a key role in our approach as, in the 
thermodynamic limit, the motion turns out to be free. Moreover, as shown in 
Appendix A, the limit 

F(i,x) = lim F N (t,5t), (84) 

N— ¥00 
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exists, and F(t, x) can then be obtained solving the free-field Hamilton- Jacobi 
problem as 



(85) 



From standard arguments of classical mechanics, it is simple to show that the 
solution for the Principal Hamilton Function, i.e. the free energy, is the integral 
of the Lagrangian over time plus the initial condition (which has the great ad- 
vantage of being a trivial one-body calculation as t = decouples the neurons) . 
More explicitly, 



F(t, x) = F(to, x ) + / dt'C(t', x), 
Jo 

where the Lagrangian can be written as 

p 

At,*) = ^E(v(t,5)) =2E« 5 



(86) 



(87) 



Having neglected the potential, the motion must be constrained in straight 
hyperplanes, and the Cauchy problem is 



to = 

Xy, = x° + t(rhy) 



We can now write the solution more explicitly as 
F(t,x) - ^(0,x ) + J dt'C(t',5t) 
t 1 N 

= 2 E(^> 2 + A i™ o ^E ln 



(88) 



(89) 



= ln2 + -^(m M ) 2 + ( ln^cosh 



E exp p'E^ 



As a consequence, the free energy of this generalization of the Hopficld model 
can be written by choosing t = —(3 and x M = for all the spatial dimensions, 
so to have 



F(p,a,d) = In 2 - | ^(m,) 2 + (In [cosh[/3^<m^>]]>. 



(90) 



We can proceed to extremization, namely dm fi F{(3, a,d) = to get 

(m^Hetanhl/^m^]), (91) 



which, turning to the original variables, can be written as 



F(/3,M) - ln2 - | ]T(m2) + (In cosh /3(£,Xm)), 



(92) 



<»V> = (£ M tanh[/3(£,Xm)]>. 



(93) 
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which are the equations that have been used trough the text. For the sake of 
clearness, the expression for the Mattis magnetizations 



m,, = ( f tanh 



5>„(cr+?c +i ) 



(94) 



is written extensively for P — 2, namely 
mi = - tanh[|(zi +z 2 ){c + b)} + 

+ ^"2^ tanh[^(zi - z 2 )(c- &)] - rf)tanh 







(^1 C+ Z2&) 



, (95) 



m 2 



(!- d ) 2 , A \ / (1-rf) 2 , r /9, w LX1 

V 7 tanh[|( Zl + z 2 )(c + 6)] + v 7 tanh[^(«i - z 2 )(c - &)] + 



+ d(l - d) tanh[|i((«i + z 2 )(c + 6) - (*i - - c))], 



(96) 



and for P = 3, namely 

mi = <i 2 (l — d) tanh/3 (mi - 

d(l-d) 2 , n, 
+ tanh p (mi + ^ + a(mi + 2m 2 + m 3 )) 

1/-, i\0 

I - mi)) 

- a(mi + m 2 + 2m 3 )) 



1 + a(m 2 + m 3 )) 

~ ""2 

in 9 

- m 3 + a(m 3 - 



2 

- TO 2 



+ 



+ 
+ 



Z 

1 - d) 2 

: — - — — tanh /3 (mi - 

d(l-d) 2 , a, 
tanh p (mi - 

d(l-d) 2 , „, 

tanh p (mi — m 2 + a(m 2 — mi)) 

^ — tanh (3 (mi - 
- — tanh /3 (m x - 



+ 4 

m 2 - m 3 



— 2omi) 
+ 2am 3 ) 



as for m 2 
m 2 -> m 3 



d) 3 

^ tanh/3 (mi — m 2 

+ tanh p (mi + m 2 + m 3 

, they can be obtained through direct permutation 



{ L_ fonV, « Cm. _ m , _|_ mg _|_ 2am 2 ) 

- 2a(mi 



- m 2 + m 3 )) , 



and m 3 
— > mi 



mi 
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